readysetgo 0
Appendix: ScalableNeuralVideoRepresentations withLearnablePositionalFeatures
We train the network by adopting mean-squared error as our loss function and using the AdamW optimizer [27]withalearning rateof0.01. Specifically,wefirstapply a2-layer MLP ontheoutput ofthepositional encoding layer,and then we stack 5NeRV blocks with upscale factors 5, 3, 2, 2, 2, respectively. To be specific, on the UVG-HD benchmark, we set the number of levels as 15, the number of features per level as 2, the maximum entries per level as224, and the coarsest resolution as 16. Table 7: Decoding time ofcoordinate-based representations measured with FPS (higher isbetter).
Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)